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ABSTRACT „ • . 

The purpose of this study was to examine the 
V -reliability and validity of a basal reading series mastery test. 
Subjects were 25 fourth graders, who were tested once on the SRA 
Reading Achievement Test, twice on the Scott-Foresman End-of-Book 9 
Criterion-referenced Test (CRT), and once on the Word Reading Test. 
Traditional psychometric correlational analyses as well as strategies 
specifically designed for examining the adequacy of 
criterion-referenced tests were applied to the data to investigate 
the following dimensions of the technical adequacy of the CRT: 
consistency of student performance across two administrations of the 
CRT, criterion-related validity of the CRT scores with respect to two 
other measures of reading proficiency, and criterion-related validity 
of the CRT mastery/nonmastery decisions with respect to pre/post 
instructional status. Results indicated that the reliability and _ 
validity was acceptable for the total test and the scale scores, "with 
the exception of the Literacy Understanding/Appreciation scale, and, 
in some cases, the Word Identification scale. Implications for. the 
development and use of criterion-referenced tests are discussed. 
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Abstract 

The purpose of this study was to examine the reliability and 
validity of a basal reading series mastery .test. Subjects were 25 
fourth .graders, who were tested once on the SRA Reading Achievement 
Test, twice on the Scott-Foresman End-of-Book 9 Criterion-referenced, 
Test (CRT), and once on the Word Reading Test. Traditional 
psychometric correlational analyses as well as strategies specifically 
designed for examining the adequacy of criterion-referenced tests were 
applied to the data to investigate the following dimensions of the 
technical adequacy of the CRT: (a) consistency of student performance 
across two administrations of the CRT, and (b) criterion-related 
validity of the CRT scores with respect to two other measures of 
reading proficiency and criterion-related validity of the CRT 
mastery/nonmastery decisions with respect to pre/post instructional 
status.^ Results indicated that the reliability and validity was 
acceptable for the total test and the scale scores, with the exception„ 
of the. Literary Understanding/ Appreciation scale and, in some cases, 
the Word Identification scale. Implications for the development and 
use of criterion-referenced tests are discussed. 
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The Technical Adequacy of a Basal Series Mastery Test: 
The Scott-Foresman Reading Program 

Within the past decade, interest in and the use of cri'terion-r/. 
referenced (CR) testing as a tool for evaluating the effects of 
instructional programs has expanded. In contrast to traditional 
global norm-referenced achievement tests, which typically have poor 
content validity with respect to classroom curricula, CR instruments 
are isomorphic with instructional programs and potentially useful for 
measuring the extent to which individuals or groups have mastered 
specific educational objectives. 

Despite the increased focus on CR tests in the schools, there has 
been a ^ 1 ack of concomi tant i nvesti gati on of the rel i abi 1 i ty and 
validity of such instruments. Teachers who create their own CR tests 
rarely possess resources to conduct expensive,, time-consuming 
reliability and validity studies. Additionally, Tindal, Shinn, Fuch^, 
Fuchs,^Deno, and * Germann (1983) ^ documented that publishers of 
commercial CR instruments typically fail to address technical aidequacy 
at all; and, when adequacy is examined, developers rely predominantly 
on traditionjsil -psychometric correlational analyses, which have been 
criticized for use with CR tests (Popham & Husek, 1969). Therefore, 
although CR tests may appear to be useful for evaluating the effects 
of instructional programs because of strong content validity, it 
remains' unclear whether they are accurate (reliable) or meaningful 
(valid) for their intended purposes. 

In response to this problem, researchers recently have begun the 
process of investigating the psychometric characteristics of 
commercially available basal reading series CR tests. Fuchs, Tindal, 



2 

Shinn, Fuchs, Deno, and Gertnann (1983) examined the test-retest 
^reliability and criterion validity- of a Ginn 720 basal mastery test 
and found its quality variable. In most analyses, the study skills 
subtests' appeared inadequate, the quality of the comprehension, 
subtests varied, and the decoding and vocabulary subtests and the 
total score were acceptable. Tindal et al. (1983) determined that a 
mastery test from the Houghton-Mifflin reading series was less than 
adequate; the decoding and comprehension test scales were both 
unreliable and invalid. These findings suggest that content and face 
validity are necessary but insufficient dimensions of CR test 
adequacy, and that test consumers must seek empirical validation of 
each CR test before relying on such test data for making instructional 
'decisions. 

The purpose of the current study wai^ to extend the work of Tindal 
et al.' (1983) and Fuchs et al. (1983) by examining the reliability and 
validity of another basal series mastery test, that of Scott-Foiresman 
(1981). In doing so, the present study sought to increase the data 
base concerning the adequacy of CR tests in order to provide relevant 
information not only to consumers of this specific measure but also to 
users of other CR instruments for which technical data are still 
unavailable. 

Method 

Subjects 

' Subjects were 25 students (13 M, 12 F) from one fourth grade 
class, located in a school district of a rural midwestern cooperative. 
The students* mean reading percentile ra^k was 65.6 (SD 31.93) as 
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measured on the Science Research Associates (SRA) Reading Achievement 
Test. Only those students for whom there were no missing data were 
included in any given analysis. 
Measures 

Three measures of reading performance were used in the study: / a 
basal series criterion-referenced test, a global norm-referenced test, 
and a curriculum-based word reading test. 

Criterion-referenced test . Four scales of the End-of-Book 9 
Criterion-referenced Test (CRT; Johns, 1981) of the Scott-Foresman 
reading series were employed as measures. Each of the four scales. 
Word Identification, Comprehension, Study and Research, and Literary 
Understanding and Appreciation, is comprised of subtests. Table 1 
lists the subtests constituting each scale and. provides brief 
descriptions of tasks the examinee is required to do within each 
subtest. This CRT includes between 12 ?.nd 43 items per scale and.^ 
cutoff scores are established at 79% and 83% correct "responses. For 
the purpose of this study, two subtests. Fiction and Nonfiction from 
the Comprehension scale and Summarizing from the Study and Research 
scale were omitted; they are not described in Table 1. With these 
omissions, items per scale ranged from 12 to 41 and the mastery cutoff 
scores fell between 76% and 83% correct responses. 



Insert Table 1 about here 



Norm-referenced test . The Science Research Associates (SRA) 
Reading Achievement Test (Naslund, Thorpe, & Lefever, 1978) is 



comprised of two subtests: vocabulary and comprehension. In the 
vocabulary section, examinees are required to select, from four 
alternatives, a sy'ionym for an underlined word In a sentence. In the 
comprehension section, examinees read. 200-300 word passages and answer 
questions In a muTtlplis choice format. Total test score Is based on a 
linear combination of the two subtests. Internal consistency 
reliability was reported at .88 (Salvia & Ysseldyke, 1981). 

Curriculum-based word reading test . The Word Reading Test (Deno, 
Mirkin, & Chiang, 1982) requires children to read aloud passages and 
Isolated word lists and Is scored In terms of average numbers of words 
correct and Incorrect over two alternate forms of the Isolated Word 
Reading and Passage Reading scales. The 200-word passages are drawn 
randomly from a student's grade appropriate basal reading book; the 
15Q-word lists sample words randomly from the basals, with 60% of 
words drawn from the student's grade appropriate level and 40% sampled 
equally. from all previous levels. For the passage and Isolated Word 
Reading Test, test-retest and alternate form reliabilities- were at 
least .90 (Fuchs, Deno, & Marston, In press; Fuchs, Wesson, Tindal, 
Mirkin, & Deno, 1981). 
Procedure 

All students were tested 1n groups by a school psychologist on 
the SRA Reading Achievement Test, and by their classroom teachers on 
the CRT. . The Word Reading Test was administered Individually by 
trained aides. Standardized administration procedures were adhered to 
on all tests. Testing time ranged from 60 to 90 minutes for the SRA 
Test, 60 to 90 minutes >for the CRT, and five to six minutes for the 
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Word Reading Test. Students were tested on the following measures In 
the following order within a 2-week time period: the CRT, the SRA 
Reading Achievement. Test, the Word Reading Test, and the CRT again. 
Data Analysis 

Consistency of performance on two adimini strati ons of the same 
test. Consistency of students' performance on the CRT was assessed in 
three ways. First, traditional test-retest reliability was determined 
by correlating scores from the two administrations of the CRT. The 
other two analysis strategies were designed specifically for 
criterion-referenced measures (see Millman, 1974). In the first of 
these, consistency of students' scale scores was determined by (a) 
computing individuals' percentage correct scores on each scale for 
each administration of the CRT, (b) calculating for each individual 
his/her difference score across the two administrations of each scale, 
and (c) determining the percentages of examinees having each possible 
difference score on each scale. In the second strategy, consistency 
of mastery-nonmastery decisions on scales was determined by dividing 
the difference between observed and chance proportions of agreements 
in decisions by the maximum value that difference could assume. (The 
chance proportion of agreements was computed by multiplying and then 
summing the marginal proportions of the same decision categories for 
the two administrations, as done in a chi-square test of association.) 

Criterion-related validity . The criterion-related validity of 
the CRT was determined in two ways.. The traditional psychometric 
strategy of correlating scores on the measure of interest (CRT) with 
criterion measures was used. The SRA Reading Achievement Test and the 
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Word Reading Test • were employed as the criterion measures. 
Additionally, ch1-square sta^tlstlcal tests were applied to contingency 
tables wherein test mastery-nonmastery represented "^one dimension of 
each table and pre-post Instructional status represented tfie other' 
dimension. Percentages of misclassif Icatlons and li coefficients 
supplemented the chl -square tests. 

Results 

Table 2 Is a display of students' mean scores and standard 
deviations on each scale and for the total score of the CRT, on the 
subtest and total scores of the SRA Reading Achievement Test, and on 
the Isolated word reading and passage reading scales of the Word 
Reading Test. 



Insert Table 2 about here 



Consistency of Performance on Administrations of the Same Test 

Test-retest reliability correlations on scales of the CRT are 
displayed In Table 3. All coefficients for the test scales and total 
test were at least .90, with the exception of ^the Literary 
Understanding/Appreciation scale, which had a coefficient of .68. 



Insert Table 3 about here 



The second analysis of the consistency of performance Involved 
calculating the percentages of examinees who had different percentage 
correct scores across the two administrations of the. CRT. Figures 1 



and 2 are graphic displays of the percentages of examinees displaying 
various difference scores on each scale of the CRT; Table 4 surrtnarlzes 
the Information Illustrated on tlife graphs. The range of difference 
scores on the sca.les fell between 0 and 42<» 'The percentage 'of 
examinees with 0% difference scores on two administrations ranged from 
13 on the Comprehension scale to 25 on the Literary 
Understanding/Appreciation scale; 



Insert Figures 1-2 and Table 4 about here 

The third analysis of the consistency of performance addressed 
consistency of mastery-nonmastery decisions across the two 
administrations of the CRT. Table 5 Is a display of the uncorrected 
and corrected proportions of examinees placed Into the same decision 
category on the two administrations. On the Word Identification, 
Comprehension, and Study and Research scales, - the corrected 
proportions were high, ranging from a proportion of agreement on the 
^tudy and Research scale of higher than chance to a proportion of 
agreement on the Word Identification scale of 965I5. greater than channel 
On the Literary Understanding/Appreciation scale, the proportion^ of 
agreement was lower, with at proportion of agreement of 57% better than 

V.' 

chance. 



Insert Table 5 about *here 



Criterion-related Val i dity ' 

Correlational analyses were conducted between the CRT scales and. 
two criterion measures, the SRA Reading Achievement Test and th^^^^^^^ 
Reading Test. Correlations between the CRT scales and the*SRA subtest 
and total test scores are displayed in Table 6. They ranged from ,,48 
to .90 when CRT scale and SRA vocabulary subtest scores were, involved; 
from ^59 to .87 when CRT scale end SRA comprehension subtest scores 
were employed; and from .55 to .89 when CRT scale and SRACtotal scores 
were usedv For the CRT total score, correlations ranged between .92 
and .95. The median correlation for the CRT Word Identification scale 
was .62; for the CRT CompreheK^:ion scale, .86; for thei CRT Studyland 
Research scale, ,89; and for the Literary Understanding/Appreciation 



scale, .55. 



IrisertTtable 6 about here 



Correlations between the CRT scales and the Word Reading Test 
scale scores are displayed in Table 7. They ranged from .42 to .73 
when isolated word reading, scores and CRT scale scores were involved, 
and from .55 to .76 when passage reading scores and CRT scale scores 
were employed. For the CRT total score, correlations , were .77 and 

- : Insert table 7 about here 



Criterion" validity also was examined ^^^^b^^^^ the^relation 
between mastery-honmast^^ the CRT and actual pre-post 



instructional status. Relevant chi-square values, phi coefficients, 
and percentages of misclassif ied students are displayed in Table 8. 
The highest percentages of misclassif ications occurred with ,the Word 
Identification (48%) and Literary Understanding/Appreciation (3556) 
scales; lower percentages were found for the Study and Research scale 
and the total score (2256). The percentage of mi sclassified students 
on the Comprehension scale was 26. . ' 

Insert Table 8 about here 

Discussion 

The purpose of the current study was to describe the reliability 
and validity of a basal reading series criterion-referenced mastery 
test. The study examined two aspects of the technical adequacy of the 
Scott-Foresman End-of-Book 9 Criterion-referenced test: (a) the 
consistency of students* performance on two administrations of the 
test, and (b) the criterion validity of the test with respect to two 
other measures of reading proficiency that have demonstrated 
psychometric strength. On these indices, the total score and all 
scale scores, with the exception of the Literary Understanding scale 

and, in some cases, the Word Identification scale, seemed adequate. 

Ir ■ ■ ■ '• ' '■■ ' 

. Test/retest reliability coefficients indicated that, when the CRT 

/ ■ ■ ■■ . . , \ • ■ . . . 

was administered twice within a short time interval, students* 
performance was somewhat inconsistent .on the Literary Understanding 
and Appreciation scale, with the correlation fallinig below the 
acceptable range for making even group decisions (Salvia & Ysseldyke, 



1981). Nevertheless', for the remaining scales and the total score, 
correlations were high and fell into the acceptable range for 
individual decision making. 

The pattern of results of this traditional .correlational analysis 
of consistency of student performance across testings was corroborated 
by the criterion-referenced strategy of examining the proportions of 
examinees consistently classified into the same decision category. -As 
with the correlational analyses, statistics were generally high. All 
corrected proportions fell above 84% better-- than- chance agreement 
except the Literary Understanding/Appreciation scale score, for which 
the corrected_pr^ortion fell below 60% better than chance. 

Inspection of the consistency of test scores displayed in Figure 
1 and 2 and in Table 4 reveals that results of the second criterion- 

referenced strategy for examining test consistency also corroborated 

_,c. ' . 

the correlational results. The "average percentage of subjects 
obtaining the same score across all scales was 18.0. Interestingly, 
the percentage for the Literary Understanding/Appreciation scale was 
higher than the percentages for the remaining scales. Nevertheless, 
for this Literary Appreciation/Understanding , scale, greater 
percentages of subjects also scored with relatively great discrepancy, 
and the 37% of subjects whose differences scores were- between- 15 and 
42% across the two testings achieved scores sufficiently discrepant to 
result in numerous inconsistent mastery decisions. 

The criterion validity of the CRT also was examined in this 
study. The traditional correlational analyses indicated that the 
criterion validity of the CRT scale scores with respect to the SRA 



Reading Achievement Test was good, with 5 05^ of correlations between 
the CRT scales and the SRA subtests falling above .80. With the Word 
Reading Test, correlations between the CRT scales and the Word Reading 
Test scales were generally lower, with 50?6 falling above .70 and no 
correlations above .80. 

With respect to both the SRA and thfi Word Reading Tests, 
correlations based on the Word Identification and the Literary 
Understanding/Appreciation scales typically were relatively low. The 
finding that criterion validity with the SRA Testl was greater than 
with the Word Reading Test is contrary to previous findings (Fuchs et 
f , 1983). Given that both the Word Reading Test and the CRT are 
cui riculum-based, one might expect a stronger relation between these 
two measures, and current findings are surprising. Nevertheless, 
correlations among curriculum-based measures and more global indices, 
such as the SRA test, have been reported frequently at ^ high levels 
(Fuchs et al., in press; Fuchs, Fuchs, & 0eno,i->1982), and correlations" 
between the CRT and the SRA tests are comparable to those reported 
earlier.. Further, performance .,on the CRT predicts concurrent 
performance on more global measures of reading proficiency better than 
other basal mastery tests that hav e been exami ned (Fuchs et al., 1983; 
Tindal et al., 1983). 

The criterion validity of the CRT also was investigated with Jiie.- 
-criterion-referenced'^sfrat'egy^.of examining the relation between the 
mastery-nonmastery classification on the CRT and actual pre-post 
instructional status. Percentages of misclassif ications ranged from 
2256 "to 48% on the CRT scales, with 22?6 of students misclassif ied on 
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the total test. These figures suggest that, for classifying students 
into groups for instruction within the basal reader for which the CRT 
was designed, the Word Identification and, Literary Understanding/ 
Appreciation scales have limited utility whereas the Comprehension and 
Study and Research scales as well as total test score are more valid. 

Consequently, the current study suggests that -the Scott-Foresman 
End-of-Book 9 CRT is generally of good quality. Much of the test 
seemed useful both for predicting global reading proficiency and for 

jnaking .decisiqn^^^ about student placement and progress within the 
curriculum. Specifically, in the test-retest consistency analyses the 
total score and the scale scores, with the exception of the Literary 
Understanding/Appreciation scale, appeared adequate. In the criterion 
validity analyses, the total and the scale scores, with the exception 
of the Word Identification and the Literary Understanding/Appreciation 
scales, demonstrated technical strength. This indicates that (a) 
educators should use 'this CRT judiciously, relying primarily on the 
Comprehension, Study and Research, and total scores for making 
decisions about student performance and mastery in the curriculum, and 
(b) test developers at "^Scott-Foresman might consider reexamining the 
Word Identification and Literary Understanding/Appreciation scales. 
In any case, the jtechjiica^^^ 

^d-of-Book 9 CRT was superior to previously examined basal mastery 
tests (see Fuchs et al., 1983; Tindal et al., 1983). A probable 
explanation for this superiority is as follows: Whereas other CR 
tests that have been examined have teachers score and compute mastery 
scores on subtests as well as scale scores, the Scott^-Foresman CRT 



limits computation of mastery scores to the test scale, and total 
scores. By doing so, the Scott-Foresman CRT requires educators to 
rely on information summarized across a relatively large sampling of 
student behavior, .is the Spearman-Brown formula indicates, when the 
number of items in a test increases, the reliability and validity of 
test scores improve correspondingly. It appears that the author of 
this Scott-Foresman CRT has capitalized. on this measurement phenomenon 
by eliminating the calculation \and consideration of mastery scores 
based on subtests, which incorporate relatively few-items^ 

Finally, in., this study, results based on traditional and on 
criterion-referenced strategies for examining test adequacy were 
analogous. Wherever traditional cor^relationaT statistics suggested 
relative strengths or weaknesses in scales, the results of the 
alternative strategies paralleled findings. .Consequently, results of 
the present study echo previous research (Tindal et al., 1983), which 
suggests that the two types of analyses corroborate, complement, and 
enhance each other. It appears that. both, strategies may be 
appropriate and necessary for investigating and describing the 
reliability and validity of criterion-referenced tests. 
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Table 1/ 

Examinees' Tasks on the Scott-Foresman End-of-Book 9 Test 
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Scale/Subtest 



Word Identification 



Examinees' Tasks 



Context and Consonants 



Syllables .Accent 



Compounds, Contractions 



Root words. Ending, 
Suffixes 



Comprehension 
Unfamiliar Words 

Idioms 

Analogous Relationships 



Read a sentence containing a word from 
which all vowels and vowel teams have been 
deleted. From an array of three choices, 
select a complete word to isubstitute for 
the incomplete one, 

Read ques ti ons concerni ng the syl 1 abl e— - 
division and accented syllable of a word. 
From two choices, select the answer to 
each question. 

From an array of three choices, select 
either a compound word or a contraction,, 
as directed. 

Given a word and three possible roots, 
^seTect^Jhe correct one. [ , 

2. Given a choice -of^fiye descriptions 
concerning what migRt^happen^o a 
root word before an ending is^dded 
and given a word with a»i ending,^ 
select the correct description. 



Read a paragraph containing, an underlined 
word, and select from among three choices 
~a synonomous-word-or- phrase; - — -"—^ — 



Read a paragraph containing an underlined 
idiom. From an array of three chofcGS, 
select a phrase that defines the idiom. 

Read a story and (a) select, from an array 
of three choices, a description of how 
two objects described in the paragraph 
are alike, and (b) select, from an array 
of three choices, a word or noun phrase 
that completes a sentence describing an 
analogy. 
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Table Jl (continued) 



Scale/Subtest 



Examinees* Tasks 



Story Problem/Solution 



Main Idea/Supporting 
Details* 



Study and Resiaarch 



Table. of Contents 

.•v>.v; 



Index 



Encyclopedia 



•Footnotes 



Headnotes 



Classifying 



Diagrams 



Read a story and answer multiple choice 
questions concerning the story's content. . 

Read a short article. • Determine whether 
sentences taken from that article are main 
ideas, supporting details, or neither. 



Given tables'of content from 'two books, 
answer multiple choice questions concerning 
Ca) the content of the books and (b) how 
to access information from them^ 

Given a partial index from"^ a book, use it 
to answer multiple choice, questions con- 
cerning how to access information from the 
-book. 

Given an illustration of a 21-volume ency- 
clopedia, answer multiple choice questions 
concerning how to access i nf ormati on\ f rom 
.the encyclopedia. . 

Read a segment from a factual article that 
contains footnotes, answer multiple choice 
questions concerning the content and use 
of the footnotes. ^§ 

Read a headnote and then answ|e^multiple 
choice questions concerning the content of 
the headnote. 

From an array of four words, select the one 
"that-does~not-be1ong -wi.thJhe-j)Jtheri_. 



Given a diagram of a ship, answer multiple 
choice questions concerning the layout , 
and contents of the ship. 



Literary Understanding/Appreciation 
Story Elements 



Elements of Style 



Read a story and answer multiple choice 
questions concerning the story's content. 

Read a paragraph and answer yes-no ques- 
tions concerning whether certain sentences 
in the paragraph are exaggerations. 



Table 1 (continued) 
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Scale/Subtest 



Examinees' Task 



Elements of Style 



Types of Literature 



Read a paragraph for which each sentence is 
identified with a number. Then, answer 
multiple choice questions concerning (a) 
which sentence is a flashback, (b) where 
personification is contained in the para- 
graph, and (c) the point of view with which 
the paragraph is written. 

Read a story and select from an array of . 
three choices, the literary form of the 
story. 
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Table 2 

Student Performance on Measures of Reading Achievement 



Test - ""^^'^ Hean SO 

End-of-Book 9 Test (N=25) 

Word Identification ^ 36 29.8 5.6 

Comprehension 25 18.3 5.3 

^ Study and Research . 41 3Q^8 8.5" 

Literary Understanding/ 12 8.2 2.2 

Appreciation , ^ " 

Total 114 .86.9 . • 19.3 

SRA Reading Achievement Test (N==22) 

Vocabulary , 28.5 9.1 

Comprehension 34.5 11 .2 

Total ' ! ^ 63.5 20.1 

Word Reading Test 

Isolated Word Reading (N=22) 38.2 19.8 

Passage Reading (N=26) 79*^2 47.1 
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Table 3 : 
Test-retest Reliabilities for Scott-Foresman End-of-Book 9 Test CN=25) 



Scale 


Reliability 


Word Identification 


.93 \ 


Comprehension ' • 


- .92 


Study and Research -a 


.93 


Literary Understandi fig/Appreciation flH^ 


.68 


Total 


.98 



Table 4 

Proportion of Subjects with Varying Percentages of Difference Scores 
Across Two Administrations of the End-of-Book 9 Test (N=24) 







Percentage of Difference Score 




Scale 




0.0 


1.0 

to . 
7.0 


^ 8.0 

to 
14.0 


15.0 

to 
24.0 


25.0 

to 
34.0 


35.0 

to 
44.0 


Word Identification 


36 


17 


55 


28 


0 


0 


0 


Comprehension 


25, 


13 


50 


29 


8 


0 


0 


Study and Research 


41 


17 


67 


4 


12 


0 ' 


0 


Literary Understanding/ 
Appreciation 


12 


25 

> 


0 


38 


8 


25 


• 4 



Number of items on the test. 



„ • Table 5 ' ' . 

« 

Uncorrected and Corrected Proportions of Examinees (N~24) PTaced 
Into the Same Decision Categories on Two Administrations 
of the End-of-Book 9 Test 



Proportion of Examinees 






Corrected for Chance 


Scale 


Uncorrected 


Agreement 


Word Identification 


.96 


.96, 


Comprehension 


.•92 


.9- 


Study and ^Research 


.83 




*■ 

Literary Understanding/Appreciation 


.63 


.57 



^Observed— Chance. Proportions/Maxiumum Value that (Observed-Chance 



Proportions) Can Assume, 



/ 
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Table 6 

Correlations Between End-of-Book 9 Test and SRA Test Scores (N=21) 



SRA 



Scott-Foresiiian Scale _ 


. Vocabulary 


Comprehension 


Total 


Word Identification 


.57 


.62 . 


.62 


Comprehension 


.80 


.86 , 


.86 


Study and Research 


.90 


. .87 


.89 


Literary Understanding/ 
Appreciation 


.48 


.59 


.55 ■ 


Total Test 


.92 


.94 


.95 




23 



Table 7 

Correlations Between End-of-Book 9 Test and Word Reading 
Test Scores (N=21 ) 

'■ ■■ S. i 

\. ■ . 

Word Reading Test 

Scott- Foresman Scale ^ Isolated Words Passages 



Word Identification 


.42 


.70 


Comprehension 


.52 


.70 


Study and Research 


.73 


.76 


JLiterary Understanding/Appreciation 


.58 


.55 


Total 


.77 


.84 



c. 
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Table 8; 

Relation Between End-of-Book 9 Test and Criterion Classification (N=23) 



Scale 


2 

X 


0 


Percentage 
Misclassified 


Word Identification 


■ .68 


.17 


48 


Comprehension 


6.88 


.55 


26 


Study and Research 


9.79 


.65 


22 


Li terary Unders ta nd i ng/App rec i a ti oh 


.52 


.15 


35 


Total 


9.79 


.65 
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Figure 1. Displays' of consistency of test scores on Word Identification and Comprehension Scales of the 
End-of-Book 9 CRT. 
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Difference in percentage of items correct on Literary Understanding/ 
Appreciation Scale • . , . 

Displays of consistency of test scores on Study and Reference and Literary 
Understanding/Appreciation Scales on the End-of-Book 9 CRT. 
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